Bring the Washington Nationals Back to MLB Championship Contention
Nationals Team Level Statistics
1. 2019 Season and 2023 Season Records
First, let’s get an overview of the season records for the Washington Nationals in 2019 and 2023. By comparing the 2019 and 2023 Washington Nationals season records side-by-side, it is evident that the 2023 season generally performed much worse than 2019. It featured more blowout losses and fewer wins with margins greater than 5. There was no specific seasonal pattern for 2023. For example, some teams might start to have a poor record after a mid-season trade or a significant injury, but none of these factors were apparent for the 2023 season. In contrast, during the 2019 season, the team improved and developed better chemistry after about one-third of the season, around 50 games. Overall, the 2023 team performed significantly worse than the 2019 team throughout the season.
Code
# Load the datalibrary(plotly)library(tidyverse)record_2019 <-read_csv("../data/2019_record.csv") # https://www.baseball-reference.com/teams/WSN/2019-schedule-scores.shtml# Create the margin graphs by selecting the columns and calculating margins as R-RArecord_2019_v1 <- record_2019 %>%select('W/L','R','RA','Gm#','Opp') %>%mutate(Margin=R-RA) %>%mutate(`W/L`=ifelse(R>RA,"W","L"))record_2019_v1 <- record_2019_v1 %>%mutate(hover_text =paste("Opponent Team:", Opp, "<br>Margin:", Margin,"<br>Result:",`W/L`)) # Add a hover text column# Create the ggplot objectp_2019 <-ggplot(record_2019_v1, aes(x =`Gm#`, y = Margin, fill =`W/L`,text = hover_text)) +geom_col(position ="dodge") +scale_fill_manual(values =c("W"="green", "L"="red")) +xlab('Game Played') +labs(title ="2019 Season Nationals Game Results") +theme_minimal()# Convert to plotly object for interactivityplotly_2019 <-ggplotly(p_2019, tooltip ="text") # Ensure tooltips are added# Print the Plotly objectplotly_2019
Figure 1: Margin Graph for Washington Nationals Records in 2019. The chart displays the game results, game margins, and opposing teams for each game of the 2019 season. The x-axis represents the games played, and the y-axis shows the results. Data was obtained from https://www.baseball-reference.com/teams/WSN/2019-schedule-scores.shtml.
Code
# Load the datalibrary(plotly)library(tidyverse)record_2023 <-read_csv("../data/2023_record.csv")# https://www.baseball-reference.com/teams/WSN/2023-schedule-scores.shtml# Create the margin graphs by selecting the columns and calculating margins as R-RA# Calculate average margin by monthrecord_2023_v1 <- record_2023 %>%select('W/L','R','RA','Gm#','Date','Opp') %>%mutate(Margin=R-RA) %>%mutate(`W/L`=ifelse(R>RA,"W","L")) %>%mutate(hover_text =paste("Opponent Team:", Opp, "<br>Margin:", Margin,"<br>Result:",`W/L`)) # Add a hover text column# record_2023_v1 <- record_2023 %>% select('W/L','R','RA','Gm#','Date') %>% mutate(Margin=R-RA) %>% mutate(`W/L`=ifelse(R>RA,"W","L")) %>% mutate(Date = mdy(paste(Date, "2023")),Month = format(Date, "%b")) %>% group_by(Month) %>% summarize(Avg_Margin =mean(Margin,na.rm=TRUE), `W/L`=ifelse(Avg_Margin>0,'W','L'))# Create the ggplot objectp_2023 <-ggplot(record_2023_v1, aes(x =`Gm#`, y = Margin, fill =`W/L`,text = hover_text)) +geom_col(position ="dodge") +scale_fill_manual(values =c("W"="green", "L"="red")) +xlab('Game Played') +labs(title ="2023 Season Nationals Game Results") +theme_minimal()# Convert to plotly object for interactivityplotly_2023 <-ggplotly(p_2023, tooltip ="text") # Ensure tooltips are added# Print the Plotly objectplotly_2023
Figure 2: Margin Graph for Washington Nationals Records in 2023. The chart displays the game results, game margins, and opposing teams for each game of the 2023 season. The x-axis represents the games played, and the y-axis shows the results. Data was obtained from https://www.baseball-reference.com/teams/WSN/2023-schedule-scores.shtml.
Performance can result from multiple aspects. One aspect could be the opposing teams getting much stronger. In MLB, each team does not play an equal number of games against all other 29 teams. The Washington Nationals play more games against teams in their division. Therefore, if a team within the same division improved significantly since 2019, it would have a direct and significant impact on the Nationals’ record.
2. Opponent Teams Records
Code
# Load the datalibrary(plotly)library(tidyverse)# Load the datarecord_2023 <-read_csv("../data/2023_record.csv") # https://www.baseball-reference.com/teams/WSN/2023-schedule-scores.shtml# Load the new datastatus_2023 <-read.csv("../data/2023_stats.csv")# https://www.baseball-reference.com/leagues/majors/2023-standings.shtml# Create a list mapping team full name to its abbreviationmlb_teams <-data.frame(Full_Name =c("Arizona Diamondbacks", "Atlanta Braves", "Baltimore Orioles","Boston Red Sox","Chicago Cubs","Chicago White Sox","Cincinnati Reds","Cleveland Guardians","Colorado Rockies","Detroit Tigers","Houston Astros","Kansas City Royals","Los Angeles Angels","Los Angeles Dodgers","Miami Marlins","Milwaukee Brewers","Minnesota Twins","New York Mets","New York Yankees","Oakland Athletics","Philadelphia Phillies","Pittsburgh Pirates","San Diego Padres","San Francisco Giants","Seattle Mariners","St. Louis Cardinals","Tampa Bay Rays","Texas Rangers","Toronto Blue Jays","Washington Nationals" ),Abbreviation =c("ARI", "ATL", "BAL","BOS","CHC","CHW","CIN","CLE","COL","DET","HOU","KCR","LAA","LAD","MIA","MIL","MIN","NYM","NYY","OAK","PHI","PIT","SDP","SFG","SEA","STL","TB","TEX","TOR","WSH" ),Logo =c("https://content.sportslogos.net/logos/54/50/full/arizona_diamondbacks_logo_primary_20123733.png","https://content.sportslogos.net/logos/54/51/full/atlanta_braves_logo_primary_20221869.png","https://content.sportslogos.net/logos/53/52/full/baltimore_orioles_logo_primary_20195398.png","https://content.sportslogos.net/logos/53/53/full/boston_red_sox_logo_primary_20097510.png","https://content.sportslogos.net/logos/54/54/full/chicago_cubs_logo_primary_19792956.png","https://content.sportslogos.net/logos/53/55/full/chicago_white_sox_logo_primary_19911413.png","https://content.sportslogos.net/logos/54/56/full/cincinnati_reds_logo_primary_20133208.png","https://content.sportslogos.net/logos/53/6804/full/cleveland_guardians_logo_primary_2022_sportslogosnet-5538.png","https://content.sportslogos.net/logos/54/58/full/colorado_rockies_logo_primary_20171892.png","https://content.sportslogos.net/logos/53/59/full/detroit_tigers_logo_primary_20162109.png","https://content.sportslogos.net/logos/53/4929/full/houston_astros_logo_primary_20137038.png","https://content.sportslogos.net/logos/53/62/full/kansas_city_royals_logo_primary_20198736.png","https://content.sportslogos.net/logos/53/6521/full/4389_los_angeles_angels-primary-2016.png","https://content.sportslogos.net/logos/54/63/full/los_angeles_dodgers_logo_primary_20127886.png","https://content.sportslogos.net/logos/54/3637/full/miami_marlins_logo_primary_20194007.png","https://content.sportslogos.net/logos/54/64/full/6474_milwaukee_brewers-primary-2020.png","https://content.sportslogos.net/logos/53/65/full/minnesota_twins_logo_primary_20102311.png","https://content.sportslogos.net/logos/54/67/full/m01gfgeorgvbfw15fy04alujm.png","https://content.sportslogos.net/logos/53/68/full/new_york_yankees_logo_primary_19685115.png","https://content.sportslogos.net/logos/53/69/full/6xk2lpc36146pbg2kydf13e50.png","https://content.sportslogos.net/logos/54/70/full/philadelphia_phillies_logo_primary_20193931.png","https://content.sportslogos.net/logos/54/71/full/1250_pittsburgh_pirates-primary-2014.png","https://content.sportslogos.net/logos/54/73/full/7517_san_diego_padres-primary-2020.png","https://content.sportslogos.net/logos/54/74/full/san_francisco_giants_logo_primary_20002208.png","https://content.sportslogos.net/logos/53/75/full/seattle_mariners_logo_primary_19933809.png","https://content.sportslogos.net/logos/54/72/full/3zhma0aeq17tktge1huh7yok5.png","https://content.sportslogos.net/logos/53/2535/full/tampa_bay_rays_logo_primary_20196768.png","https://content.sportslogos.net/logos/53/77/full/ajfeh4oqeealq37er15r3673h.png","https://content.sportslogos.net/logos/53/78/full/toronto_blue_jays_logo_primary_20208446.png","https://content.sportslogos.net/logos/54/578/full/washington_nationals_logo_primary_20117280.png" ))# Select the W, L and team namestatus_2023_v1 <- status_2023 %>%select(Tm,W,L)# Get the W Loss Margin columns for 2023record_2023_v2 <- record_2023 %>%select('W/L','R','RA','Opp') %>%mutate(Margin=R-RA) %>%group_by(Opp) %>%summarize(Game_Count_2023 =n(),Margin_avg=mean(Margin))# Add a rank column to create color bins for better visualsrecord_2023_v2 <- record_2023_v2 %>%mutate(Rank =rank(Game_Count_2023, ties.method ="min")) %>% mutate# join the table to get the game played, W, L for each teamrecord_2023_v3 <- record_2023_v2 %>%left_join(mlb_teams, by =c("Opp"="Abbreviation")) %>%left_join(status_2023_v1, by=c("Full_Name"="Tm"))# Create custom hover text# record_2023_v3$hover_text <- paste("Team: ", record_2023_v3$Full_Name, "<br>Game Count: ", record_2023_v3$Game_Count_2023, '<br><img src="', record_2023_v3$Logo, '" width="50" height="50" />', sep = "")record_2023_v3$hover_text <-paste("Team: ", record_2023_v3$Full_Name, "<br>Game Count: ", record_2023_v3$Game_Count_2023,sep ="")# Load the datarecord_2019 <-read_csv("../data/2019_record.csv") # https://www.baseball-reference.com/teams/WSN/2019-schedule-scores.shtml# Load the new datastatus_2019 <-read.csv("../data/2019_stats.csv")# https://www.baseball-reference.com/leagues/majors/2019-standings.shtml# Select the W, L and team namestatus_2019_v1 <- status_2019 %>%select(Tm,W,L)# Get the W Loss Margin columns for 2019record_2019_v2 <- record_2019 %>%select('W/L','R','RA','Opp') %>%mutate(Margin=R-RA) %>%group_by(Opp) %>%summarize(Game_Count_2019 =n(),Margin_avg=mean(Margin))# Add a rank column to create color bins for better visualsrecord_2019_v2 <- record_2019_v2 %>%mutate(Rank =rank(Game_Count_2019, ties.method ="min")) %>% mutate# join the table to get the game played, W, L for each teamrecord_2019_v3 <- record_2019_v2 %>%left_join(mlb_teams, by =c("Opp"="Abbreviation")) %>%left_join(status_2019_v1, by=c("Full_Name"="Tm"))# Create custom hover text# record_2019_v3$hover_text <- paste("Team: ", record_2019_v3$Full_Name, "<br>Game Count: ", record_2019_v3$Game_Count_2019, '<br><img src="', record_2019_v3$Logo, '" width="50" height="50" />', sep = "")record_2019_v3$hover_text <-paste("Team: ", record_2019_v3$Full_Name, "<br>Game Count: ", record_2019_v3$Game_Count_2019,sep ="")# Create the 2023 plotfig <-plot_ly() %>%add_markers(data = record_2023_v3, x =~W, y =~Margin_avg,marker =list(size =~Game_Count_2023*100, sizemode ='area',color =~Rank, colorscale ='Viridis', alpha =0.4),mode ='markers',text =~Opp, # Display team names inside the bubblestextfont =list(size =12),textposition ='middle center', # Position the text in the center of the bubbleshoverinfo ='text',hovertext =~paste("Team:", Full_Name, "<br>Game Count: ", Game_Count_2023),name ='2023',visible = T) # Ensure the 2023 data is visible by default# Add the 2019 data to the same plotfig <- fig %>%add_markers(data = record_2019_v3, x =~W, y =~Margin_avg,marker =list(size =~Game_Count_2019*100, sizemode ='area',color =~Rank, colorscale ='Viridis', alpha =0.4),mode ='markers+text',text =~Opp, # Display team names inside the bubblestextposition ='middle center', # Position the text in the center of the bubbleshoverinfo ='text',textfont =list(size =12),hovertext =~paste("Team:", Full_Name, "<br>Game Count: ", Game_Count_2019),name ='2019',visible = F) # Start with this trace hidden# Customize the layout with a toggle buttonfig <- fig %>%layout(title ='Wins and Margins Plot By Opponent Teams',xaxis =list(title ='Wins', range =c(45, 110)),yaxis =list(title ='Avg Margin'),updatemenus =list(list(type ='buttons',direction ='left',x =-0.1,xanchor ='left',y =1.1,yanchor ='top',buttons =list(list(method ="update",args =list(list(visible =c(TRUE, FALSE)),list(title ="Wins and Margins Plot By Opponent Teams in 2023")),label ="2023"),list(method ="update",args =list(list(visible =c(FALSE, TRUE)),list(title ="Wins and Margins Plot By Opponent Teams in 2019")),label ="2019") ) ) ) )# Show the plotfig
Figure 3: Scatter Bubble Chart for Opponent Team Average Margins and Wins in 2019 and 2023. The x-axis represents the number of games won, and the y-axis represents the average margin per game. The size of each bubble indicates the number of games played against the Washington Nationals, and the color categorizes the games into different clusters. Data was obtained from https://www.baseball-reference.com/leagues/majors/2023-standings.shtml, https://www.baseball-reference.com/leagues/majors/2019-standings.shtml, https://www.baseball-reference.com/teams/WSN/2023-schedule-scores.shtml, and https://www.baseball-reference.com/teams/WSN/2019-schedule-scores.shtml.
By toggling between 2019 and 2023, the games played against different teams changed slightly, but the Washington Nationals still played the most games against the following four teams: Miami Marlins, Philadelphia Phillies, Atlanta Braves, and New York Mets. Comparing the records for those four teams, the games they won in 2019 and 2023 remained quite consistent, but the winning margins were much smaller in 2023. This indicates a slightly worse performance for the four teams the Nationals played the most. It indirectly suggests that the Nationals’ poorer record in 2023 is not directly due to their opponents’ strength, but more likely due to their own diminished capabilities. Now, let’s dig into more statistics from the team’s operational management level.
3. Payroll by Positions for the 2019 and 2023 Seasons
Code
# Load the datalibrary(plotly)library(tidyverse)# Source: https://legacy.baseballprospectus.com/compensation/?team=WASsalary_2019 <-read_csv("../data/Salary_2019.csv")# Source: https://www.spotrac.com/mlb/washington-nationals/payroll/2023/salary_2023 <-read_csv("../data/Salary_2023.csv")# Looking at position and salary columnsalary_2019 <- salary_2019 %>% dplyr::select("Pos","Salary") %>%na.omit()salary_2023 <- salary_2023 %>% dplyr::select("POS.","BASE SALARY") %>%na.omit()# check positions in each dataset# unique(salary_2019$Pos)# unique(salary_2023$`POS.`)# replace the 2023 "RP/CL" to "RP", change the dollar format into value formatsalary_2023 <- salary_2023 %>%mutate(Pos =ifelse(`POS.`!="RP/CL",`POS.`,"RP")) %>%mutate(Salary_2023=as.numeric(gsub("\\$", "", gsub(",", "", `BASE SALARY`)))) %>% dplyr::select(Pos, Salary_2023)# change the 2019 data dollar format into value formatsalary_2019 <- salary_2019 %>%mutate(Salary_2019=as.numeric(gsub("\\$", "", gsub(",", "", Salary))))# Group by positionsalary_2023_1 <- salary_2023 %>%group_by(Pos) %>%summarize(salary_sum_2023=sum(as.numeric(Salary_2023)))# calculate totalsalary_2023_1$total_2023 <-sum(salary_2023_1$salary_sum_2023)# calculate percentagesalary_2023_1$percentage_2023 <- salary_2023_1$salary_sum_2023/salary_2023_1$total_2023# Group by positionsalary_2019_1 <- salary_2019 %>%group_by(Pos) %>%summarize(salary_sum_2019=sum(as.numeric(Salary_2019)))# calculate totalsalary_2019_1$total_2019 <-sum(salary_2019_1$salary_sum_2019)# calculate percentagesalary_2019_1$percentage_2019 <- salary_2019_1$salary_sum_2019/salary_2019_1$total_2019# Choose the top 4 and combine the resttop_four_2019 <- salary_2019_1 %>%arrange(desc(percentage_2019)) %>%slice(1:4)others_2019 <- salary_2019_1 %>%arrange(desc(percentage_2019)) %>%slice(5:n()) %>%summarise(Pos ='Others',salary_sum_2019 =sum(salary_sum_2019),total_2019 =first(total_2019), percentage_2019 =sum(percentage_2019))salary_2019_final <-rbind(top_four_2019,others_2019)# Rename the positionssalary_2019_final <- salary_2019_final %>%mutate(Pos=case_when( Pos =="SP"~"Starting Pitcher", Pos =="1B"~"First Baseman", Pos =="3B"~"Third Baseman", Pos =="RP"~"Relief Pitcher",TRUE~"Others"))# Get the total salary for 2019salary_total_2019 <-mean(salary_2019_final$total_2019,na.rm=TRUE)# Choose the top 4 and combine the resttop_four_2023 <- salary_2023_1 %>%arrange(desc(percentage_2023)) %>%slice(1:4)others_2023 <- salary_2023_1 %>%arrange(desc(percentage_2023)) %>%slice(5:n()) %>%summarise(Pos ='Others',salary_sum_2023 =sum(salary_sum_2023),total_2023 =first(total_2023), percentage_2023 =sum(percentage_2023))salary_2023_final <-rbind(top_four_2023,others_2023)# Rename the positionssalary_2023_final <- salary_2023_final %>%mutate(Pos=case_when( Pos =="SP"~"Starting Pitcher", Pos =="1B"~"First Baseman", Pos =="2B"~"Second Baseman", Pos =="RP"~"Relief Pitcher",TRUE~"Others"))# Get the total salary for 2023salary_total_2023 <-mean(salary_2023_final$total_2023,na.rm=TRUE)# Create pie chart for the first datasetpie1 <-plot_ly(salary_2019_final, labels =~Pos, values =~salary_sum_2019, type ='pie',textinfo ='percent',text =~paste("2019 Total Payroll:", format(salary_total_2019, big.mark =",", scientific =FALSE), "<br>Position:", salary_2019_final$Pos, "<br>Salary:",format(salary_2019_final$salary_sum_2019, big.mark =",", scientific =FALSE)),hoverinfo ='text',textposition ='inside', # Position the text inside the slicesmarker =list(colors =c('#0072B2','#F0E442', '#D55E00', '#CC79A7', '#009E73'), # Assign new colorsline =list(color ='#FFFFFF', width =2) # Set slice borders ),domain =list(x =c(0, 0.6), y =c(0, 1)),name ="2019 Payroll") %>%layout(title ="2019 Payroll Allocations", showlegend =TRUE)# Create pie chart for the second datasetpie2 <-plot_ly(salary_2023_final, labels =~Pos, values =~salary_sum_2023, type ='pie',textinfo ='percent',text =~paste("2023 Total Payroll:", format(salary_total_2023, big.mark =",", scientific =FALSE), "<br>Position:", salary_2023_final$Pos, "<br>Salary:",format(salary_2023_final$salary_sum_2023, big.mark =",", scientific =FALSE)),hoverinfo ='text',textposition ='inside', # Position the text inside the slicesmarker =list(colors =c('#0072B2','#F0E442', '#FF0000', '#CC79A7', '#009E73'), # Assign new colorsline =list(color ='#FFFFFF', width =2) # Set slice borders ),domain =list(x =c(0.6, 1), y =c(0, 0.5)),name ="2023 Payroll") %>%layout(title ="2023 Payroll Allocations", showlegend =TRUE)# Combine the pie charts side by sidesubplot(pie1, pie2, nrows =1, shareX =TRUE, shareY =TRUE, titleX =TRUE) %>%layout(title ="Payroll Pie Charts, 2019(Left) Versus 2023(Right)")
Figure 4: Pie Chart for Washington Nationals Payroll by Positions in 2019 and 2023. The two charts show the total payroll by the top 5 positions. The top 4 positions were kept, while the remaining are grouped into the ‘Others’ category. The size of the pie chart is proportional to the payroll size in 2019 and 2023, respectively. The 2019 data is obtained from https://legacy.baseballprospectus.com/compensation/?team=WAS, and the 2023 data is obtained from https://www.spotrac.com/mlb/washington-nationals/payroll/2023/.
Code
# Load the datalibrary(plotly)library(tidyverse)# Source: https://legacy.baseballprospectus.com/compensation/?team=WASsalary_2019 <-read_csv("../data/Salary_2019.csv")# Source: https://www.spotrac.com/mlb/washington-nationals/payroll/2023/salary_2023 <-read_csv("../data/Salary_2023.csv")# Looking at position and salary columnsalary_2019 <- salary_2019 %>% dplyr::select("Pos","Salary") %>%na.omit()salary_2023 <- salary_2023 %>% dplyr::select("POS.","BASE SALARY") %>%na.omit()# check positions in each dataset# unique(salary_2019$Pos)# unique(salary_2023$`POS.`)# replace the 2023 "RP/CL" to "RP", change the dollar format into value formatsalary_2023 <- salary_2023 %>%mutate(Pos =ifelse(`POS.`!="RP/CL",`POS.`,"RP")) %>%mutate(Salary_2023=as.numeric(gsub("\\$", "", gsub(",", "", `BASE SALARY`)))) %>% dplyr::select(Pos, Salary_2023)# change the 2019 data dollar format into value formatsalary_2019 <- salary_2019 %>%mutate(Salary_2019=as.numeric(gsub("\\$", "", gsub(",", "", Salary))))# Group by positionsalary_2023_1 <- salary_2023 %>%group_by(Pos) %>%summarize(salary_sum_2023=sum(as.numeric(Salary_2023)))# calculate totalsalary_2023_1$total_2023 <-sum(salary_2023_1$salary_sum_2023)# calculate percentagesalary_2023_1$percentage_2023 <- salary_2023_1$salary_sum_2023/salary_2023_1$total_2023# Group by positionsalary_2019_1 <- salary_2019 %>%group_by(Pos) %>%summarize(salary_sum_2019=sum(as.numeric(Salary_2019)))# calculate totalsalary_2019_1$total_2019 <-sum(salary_2019_1$salary_sum_2019)# calculate percentagesalary_2019_1$percentage_2019 <- salary_2019_1$salary_sum_2019/salary_2019_1$total_2019# Choose the top 4 and combine the resttop_four_2019 <- salary_2019_1 %>%arrange(desc(percentage_2019)) %>%slice(1:4)others_2019 <- salary_2019_1 %>%arrange(desc(percentage_2019)) %>%slice(5:n()) %>%summarise(Pos ='Others',salary_sum_2019 =sum(salary_sum_2019),total_2019 =first(total_2019), percentage_2019 =sum(percentage_2019))salary_2019_final <-rbind(top_four_2019,others_2019)# Rename the positionssalary_2019_final <- salary_2019_final %>%mutate(Pos=case_when( Pos =="SP"~"Starting Pitcher", Pos =="1B"~"First Baseman", Pos =="3B"~"Other Baseman", Pos =="RP"~"Relief Pitcher",TRUE~"Others"))# Get the total salary for 2019salary_total_2019 <-mean(salary_2019_final$total_2019,na.rm=TRUE)# Choose the top 4 and combine the resttop_four_2023 <- salary_2023_1 %>%arrange(desc(percentage_2023)) %>%slice(1:4)others_2023 <- salary_2023_1 %>%arrange(desc(percentage_2023)) %>%slice(5:n()) %>%summarise(Pos ='Others',salary_sum_2023 =sum(salary_sum_2023),total_2023 =first(total_2023), percentage_2023 =sum(percentage_2023))salary_2023_final <-rbind(top_four_2023,others_2023)# Rename the positionssalary_2023_final <- salary_2023_final %>%mutate(Pos=case_when( Pos =="SP"~"Starting Pitcher", Pos =="1B"~"First Baseman", Pos =="2B"~"Other Baseman", Pos =="RP"~"Relief Pitcher",TRUE~"Others"))# Get the total salary for 2023salary_total_2023 <-mean(salary_2023_final$total_2023,na.rm=TRUE)salary_combined <- salary_2019_final %>%left_join(salary_2023_final,by='Pos') %>%mutate(change_percent = (percentage_2023-percentage_2019)) %>%mutate(change_dollar = salary_sum_2023- salary_sum_2019) %>%mutate(percentage_shift =change_dollar/salary_sum_2019) %>% dplyr::select("Pos","change_percent","change_dollar","percentage_shift")# Create separate hover texts for each metrichover_text_dollars <-with(salary_combined, paste("Position: ", Pos, "<br>","Change in Dollars: $", format(change_dollar, big.mark =",")))hover_text_percent <-with(salary_combined, paste("Position: ", Pos, "<br>","Change Percent: ", format(change_percent *100, digits =2, nsmall =2), "%"))hover_text_shift <-with(salary_combined, paste("Position: ", Pos, "<br>","Percentage Shift: ", format(percentage_shift *100, digits =2, nsmall =2), "%"))# Create the initial plot with the default display set to 'change_dollar'p <-plot_ly(data = salary_combined, x =~Pos, y =~change_dollar, type ='bar', name ='Change in Dollars',text =~hover_text_dollars, # Assign initial hover texthoverinfo ="text", # Specify that hoverinfo should show textmarker =list(color ='rgb(49,130,189)'),textposition ='none')# Adding the update menu for togglingp <- p %>%layout(title ="Washington Nationals Payroll Changes from 2019 to 2023",xaxis =list(title ="Position"),yaxis =list(title ="Change in Dollars"), # Initial y-axis titleupdatemenus =list(list(type ="buttons",direction ="left",x =0,xanchor ="left",y =-0.1,yanchor ="top",buttons =list(list(method ="update",args =list(list("y"=list(salary_combined$change_dollar), "text"=list(hover_text_dollars)), list("yaxis.title"="Change in Dollars")),label ="Change in Dollars" ),list(method ="update",args =list(list("y"=list(salary_combined$change_percent), "text"=list(hover_text_percent)),list("yaxis.title"="Change in Proportion")),label ="Change in Proportion" ),list(method ="update",args =list(list("y"=list(salary_combined$percentage_shift), "text"=list(hover_text_shift)),list("yaxis.title"="Proportion Shift Based on 2019 Payroll")),label ="Proportion Shift Based on 2019 Dollars" ) ) ) ))# Print the plotp
Figure 5: Bar Chart for Washington Nationals Payroll Changes by Positions between 2019 and 2023. This bar chart complements the pie chart by displaying three statistics. The first is the dollar value change for each position from 2019 to 2023. The second shows the proportion of that position’s salary change. The third statistic represents the dollar value change divided by the total 2019 salary for that position to provide a more detailed percentage shift. The 2019 data is obtained from https://legacy.baseballprospectus.com/compensation/?team=WAS, and the 2023 data is obtained from https://www.spotrac.com/mlb/washington-nationals/payroll/2023/.
The pie chart displays the total payroll by positions for the years 2019 and 2023. Accompanying this, a bar chart serves as a supplement, showing the dollar or percentage change from 2019 to 2023. There are two significant findings related to the payroll statistics. In 2019, the Washington Nationals spent almost twice the amount of money that they did in 2023. All positions show a decrease in dollar values according to the bar chart. This raises significant questions: What caused the large decrease in total payroll from 2019 to 2023? What occurred at the team operational level? On the other hand, proportionally, the Nationals allocated much more money to pitchers in 2023, attempting to invest the same amount of money to sign star pitchers. According to the proportion shift graph, the baseman positions experienced the largest decrease in dollar value change divided by the 2019 payroll, indicating that in 2023, these positions were not prioritized in the team-building process. This is a significant issue since, ideally, teams need to distribute their spending evenly across different positions to achieve a more balanced team. Although an outstanding pitcher is key to a competitive team, fans might also want to see star batters who can produce more home runs. In general, people prefer offense over defense.
4. Team Operation Statistics
Code
library(tidyverse)# Load the datastadium <-read_csv("../data/Nationals_Stadium_Stats.csv")# https://www.baseball-reference.com/teams/WSN/attend.shtml# Drop year 2020 due to the Covid impactstadium <- stadium %>%filter (Year!=2020& Year!=2024)# Convert 'Est. Payroll' to numeric after removing commas and dollar signsstadium$`Est. Payroll`<-as.numeric(gsub("[,$]", "", stadium$`Est. Payroll`))library(readxl)# read the revenue related file# Get the names of all sheets in the Excel filesheets1 <-excel_sheets("../data/statistic_id203506_washington-nationals-average-ticket-price-2006-2023.xlsx")# Read the second tab (assuming its name is stored in the variable 'sheet_name')ticket <-read_excel("../data/statistic_id203506_washington-nationals-average-ticket-price-2006-2023.xlsx", sheet = sheets1[2])# Drop the unused rowsticket <- ticket %>%na.omit()# Replace column namesnames(ticket) <-c('Year','Ticket Price')# Remove year 2020 due to Covid impact and the additional years outside 2008 to 2023 to match the time scaleticket <- ticket %>%filter (Year!=2020& Year >=2008 ) %>%mutate(Year =as.numeric(Year))# Join the ticket price column to the stadium datasetstadium_combined <- stadium %>%left_join(ticket,by='Year')# read the revenue related file# Get the names of all sheets in the Excel filesheets2 <-excel_sheets("../data/statistic_id196692_washington-nationals-revenue-2001-2022.xlsx")# Read the second tab (assuming its name is stored in the variable 'sheet_name')revenue <-read_excel("../data/statistic_id196692_washington-nationals-revenue-2001-2022.xlsx", sheet = sheets2[2])# Drop the unused rowsrevenue <- revenue %>%na.omit()# Replace column namesnames(revenue) <-c('Year','Revenue')# Remove year 2020 due to Covid impact and the additional years outside 2008 to 2023 to match the time scalerevenue <- revenue %>%filter (Year!=2020& Year >=2008 ) %>%mutate(Year =as.numeric(Year))# Join the revenue column to the stadium datasetstadium_final <- stadium_combined %>%left_join(revenue,by='Year')library(plotly)# Create a base plot with hover textp <-plot_ly(stadium_final, x =~Year, y =~Attendance, type ='scatter', mode ='lines+markers', name ='Attendance',text =~paste("Year:", Year, "<br>Attendance:", format(Attendance, big.mark =",", scientific =FALSE)), hoverinfo ='text')# Add other traces with their respective hover textp <- p %>%add_trace(y =~`Est. Payroll`, name ="Estimated Payroll", visible = F,text =~paste("Year:", Year, "<br>Payroll:", format(`Est. Payroll`, big.mark =",", scientific =FALSE)), hoverinfo ='text')p <- p %>%add_trace(y =~`Ticket Price`, name ="Average Ticket Price", visible = F,text =~paste("Year:", Year, "<br>Price:", `Ticket Price`, "$"), hoverinfo ='text')p <- p %>%add_trace(y =~Revenue, name ="Revenue", visible = F,text =~paste("Year:", Year, "<br>Revenune in Millions:", Revenue), hoverinfo ='text')p <- p %>%add_trace(y =~PPF, name ="Pitcher's Park Factor", visible = F,text =~paste("Year:", Year, "<br>Factor Value:", PPF), hoverinfo ='text')p <- p %>%add_trace(y =~BPF, name ="Batter's Park Factor", visible = F,text =~paste("Year:", Year, "<br>Factor Value:", BPF), hoverinfo ='text')# Define layout and buttons for interactivityfinal_plot <- p %>%layout(title ='Interactive Visualization of Washington Nationals Statistics',xaxis =list(title ='Year'),yaxis =list(title ='Value'),updatemenus =list(list(type ='buttons',direction ='left',x =0,xanchor ='left',y =-0.2,yanchor ='top',buttons =list(list(method ='update', args =list(list(visible =c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE)), list(title ='Attendance')), label ='Attendance'),list(method ='update', args =list(list(visible =c(FALSE, TRUE, FALSE, FALSE, FALSE, FALSE)), list(title ='Estimated Payroll')), label ='Payroll'),list(method ='update', args =list(list(visible =c(FALSE, FALSE, TRUE, FALSE, FALSE, FALSE)), list(title ='Average Ticket Price')), label ='Ticket Price'),list(method ='update', args =list(list(visible =c(FALSE, FALSE, FALSE, TRUE, FALSE, FALSE)), list(title ='Revenue')), label ='Revenue'),list(method ='update', args =list(list(visible =c(FALSE, FALSE, FALSE, FALSE, TRUE, FALSE)), list(title ='Pitcher\'s Park Factor')), label ='PPF'),list(method ='update', args =list(list(visible =c(FALSE, FALSE, FALSE, FALSE, FALSE, TRUE)), list(title ='Batter\'s Park Factor')), label ='BPF') ) ) ))# Show the plotfinal_plot
Figure 6: Line Chart for Washington Nationals Stadium, Revenue, and Payroll related statistics from 2008 to 2023. The chart contains six aspects: ‘Attendance’, ‘Estimated Payroll’, ‘Average Ticket Price’, ‘Revenue (millions)’, ‘Pitcher’s Park Factor’, and ‘Batter’s Park Factor’. Different aspects can be selected individually. Year 2020 is omitted due to the Covid impact of a short season. For Pitcher’s Park Factor (PPF), a number greater than 100 suggests that the park is harder on pitchers, more hitter-friendly. And for Batter’s Park Factor (BPF), a number greater than 100 suggests that the park is favorable for hitters, boosting offensive statistics. The ticket price data is obtained from https://www.statista.com/statistics/203506/washington-nationals-average-ticket-price/. The revenue data is obtained from https://www.statista.com/statistics/196692/revenue-of-the-washington-nationals-since-2006/. The other data is obtained from the source https://www.baseball-reference.com/teams/WSN/attend.shtml.
The last chart from the team operational perspective, representing trends from 2008 to 2023 at a team operational level, is a line chart that illustrates different aspects. Attendance and payroll both dropped significantly since 2019. However, interestingly, revenue has remained steady. This suggests that the Nationals’ management team has intentionally reduced spending on the team. While this could be part of a strategic plan, it may harm fan loyalty. The decrease in attendance has impacted the two park factors, which both dropped below 100. The Nationals need to initiate some marketing campaigns to attract more fan attendance, which could have a long-term impact on the team’s performance. In the next section, more statistics related to team offense and defense will be reviewed to see if any detailed strategies related to team building could be proposed.
Pitching Analysis: Team and Individual Levels
Code
# Data prep - Pitchinglibrary(openxlsx)library(plotly)library(tidyverse)library(fmsb)library(ggplot2)library(readxl)# Read the datasetspitching2019 <-read_excel("../data/2019Washington_Pitching_Individual.xlsx")batting2019 <-read_excel("../data/2019Washington_Batting_Individual.xlsx")pitching2023 <-read_excel("../data/2023Washington_Pitching_Individual.xlsx")batting2023 <-read_excel("../data/2023Washington_Batting_Individual.xlsx")# Print the first few rows of the data to confirm it's read correctly# head(pitching2019)# head(batting2019)# head(pitching2023)# head(batting2023)# Convert columns to numeric, ignoring non-numeric values (coerce problems to NA)pitching2019$ERA <-as.numeric(pitching2019$ERA)pitching2019$WHIP <-as.numeric(pitching2019$WHIP)pitching2019$SO9 <-as.numeric(pitching2019$SO9)pitching2023$ERA <-as.numeric(pitching2023$ERA)pitching2023$WHIP <-as.numeric(pitching2023$WHIP)pitching2023$SO9 <-as.numeric(pitching2023$SO9)# Handle NA values by using mean with na.rm = TRUEpitching_data <-data.frame(metric =c("ERA", "WHIP", "SO9"),`2019`=c(mean(pitching2019$ERA, na.rm =TRUE), mean(pitching2019$WHIP, na.rm =TRUE), mean(pitching2019$SO9, na.rm =TRUE)),`2023`=c(mean(pitching2023$ERA, na.rm =TRUE), mean(pitching2023$WHIP, na.rm =TRUE), mean(pitching2023$SO9, na.rm =TRUE)))# Combined tables# Combined '2019' and '2023' for pitching & batting# Add a 'Year' column to each dataset# Filter on only SP and RPpitching2019 <- pitching2019 %>%mutate(Year =2019) %>%filter(Pos=="SP"|Pos=="RP")# Filter on top 15 rankbatting2019 <- batting2019 %>%mutate(Year =2019) %>%filter(Rk <=15)# Filter on only SP and RPpitching2023 <- pitching2023 %>%mutate(Year =2023) %>%filter(Pos=="SP"|Pos=="RP")# Filter on top 15 rankbatting2023 <- batting2023 %>%mutate(Year =2023) %>%filter(Rk <=15)# Combine the pitching datasetscombined_pitching <-rbind(pitching2019, pitching2023)# Combine the batting datasetscombined_batting <-rbind(batting2019, batting2023)# Drop Inf for ERA col in 'combined_pitching'combined_pitching <- combined_pitching[is.finite(combined_pitching$ERA), ]# Print the first few rows of the combined data to confirm it's combined correctly# head(combined_pitching)# head(combined_batting)# Calculate the mean of the provided pitching stats for the team# We first calculate the means of the statistics for the teamteam_stats_2019 <- pitching2019 %>%summarise(ERA =mean(ERA, na.rm =TRUE),WHIP =mean(WHIP, na.rm =TRUE),SO9 =mean(SO9, na.rm =TRUE),FIP =mean(FIP, na.rm =TRUE),H =mean(H, na.rm =TRUE),R =mean(R, na.rm =TRUE))categories <-c("ERA", "WHIP", "SO9", "FIP", "H", "R","ERA")max_values <-c(ERA =max(combined_pitching$ERA), WHIP =max(combined_pitching$WHIP), SO9 =max(combined_pitching$SO9), FIP =max(combined_pitching$FIP), H =max(combined_pitching$H), R =max(combined_pitching$R)) # Upper limit of poor performancemin_values <-c(ERA =min(combined_pitching$ERA), WHIP =min(combined_pitching$WHIP), SO9 =min(combined_pitching$SO9), FIP =min(combined_pitching$FIP), H =min(combined_pitching$H), R =min(combined_pitching$R)) # Lower limit of good performance# Normalize the stats (Lower is better for ERA, WHIP, FIP, H, and R; higher is better for SO9)# For ERA, WHIP, FIP, H, R we subtract from the max value to invert the scaleteam_stats_normalized_2019 <-mutate(team_stats_2019,ERA = (max_values['ERA'] - ERA) / (max_values['ERA'] - min_values['ERA']),WHIP = (max_values['WHIP'] - WHIP) / (max_values['WHIP'] - min_values['WHIP']),SO9 = (SO9-min_values['SO9']) / (max_values['SO9'] - min_values['SO9']),FIP = (max_values['FIP'] - FIP) / (max_values['FIP'] - min_values['FIP']),H = (max_values['H'] - H) / (max_values['H'] - min_values['H']),R = (max_values['R'] - R) / (max_values['R'] - min_values['R']))values_2019 <-c(team_stats_normalized_2019$ERA, team_stats_normalized_2019$WHIP, team_stats_normalized_2019$SO9, team_stats_normalized_2019$FIP, team_stats_normalized_2019$H, team_stats_normalized_2019$R)values_2019 <-c(values_2019, values_2019[1]) # Add the first value to the end to close the radar chartteam_stats_2023 <- pitching2023 %>%summarise(ERA =mean(ERA, na.rm =TRUE),WHIP =mean(WHIP, na.rm =TRUE),SO9 =mean(SO9, na.rm =TRUE),FIP =mean(FIP, na.rm =TRUE),H =mean(H, na.rm =TRUE),R =mean(R, na.rm =TRUE))categories <-c("ERA", "WHIP", "SO9", "FIP", "H", "R","ERA")max_values <-c(ERA =max(combined_pitching$ERA), WHIP =max(combined_pitching$WHIP), SO9 =max(combined_pitching$SO9), FIP =max(combined_pitching$FIP), H =max(combined_pitching$H), R =max(combined_pitching$R)) # Upper limit of poor performancemin_values <-c(ERA =min(combined_pitching$ERA), WHIP =min(combined_pitching$WHIP), SO9 =min(combined_pitching$SO9), FIP =min(combined_pitching$FIP), H =min(combined_pitching$H), R =min(combined_pitching$R)) # Lower limit of good performance# Normalize the stats (Lower is better for ERA, WHIP, FIP, H, and R; higher is better for SO9)# For ERA, WHIP, FIP, H, R we subtract from the max value to invert the scaleteam_stats_normalized_2023 <-mutate(team_stats_2023,ERA = (max_values['ERA'] - ERA) / (max_values['ERA'] - min_values['ERA']),WHIP = (max_values['WHIP'] - WHIP) / (max_values['WHIP'] - min_values['WHIP']),SO9 = (SO9-min_values['SO9']) / (max_values['SO9'] - min_values['SO9']),FIP = (max_values['FIP'] - FIP) / (max_values['FIP'] - min_values['FIP']),H = (max_values['H'] - H) / (max_values['H'] - min_values['H']),R = (max_values['R'] - R) / (max_values['R'] - min_values['R']))values_2023 <-c(team_stats_normalized_2023$ERA, team_stats_normalized_2023$WHIP, team_stats_normalized_2023$SO9, team_stats_normalized_2023$FIP, team_stats_normalized_2023$H, team_stats_normalized_2023$R)values_2023 <-c(values_2023, values_2023[1]) # Add the first value to the end to close the radar chart# Create the original valuesvalues_2019_org <-unlist(team_stats_2019)values_2019_org <-c(values_2019_org, values_2019_org[1]) # Add the first value to the end to close the radar chartvalues_2023_org <-unlist(team_stats_2023)values_2023_org <-c(values_2023_org, values_2023_org[1]) # Add the first value to the end to close the radar chart# Create a radar chart in plotly with both 2019 and 2023 datap1 <-plot_ly() %>%add_trace(type ='scatterpolar',mode ='markers+lines',fill ='toself',r = values_2019,theta = categories,fillcolor ='rgba(255, 0, 0, 0.5)',line =list(color ='rgba(255, 0, 0, 0.5)'),hoverinfo='text',hovertext=~paste("2019","<br>Stat:",categories,"<br>Original Value:", round(values_2019_org,3)),name ='2019' ) %>%add_trace(type ='scatterpolar',mode ='markers+lines',fill ='toself',r = values_2023,theta = categories,fillcolor ='rgba(0, 0, 255, 0.5)',line =list(color ='rgba(0, 0, 255, 0.5)'),hoverinfo='text',hovertext=~paste("2023","<br>Stat:",categories,"<br>Original Value:", round(values_2023_org,3)),name ='2023' ) %>%layout(polar =list(radialaxis =list(visible =TRUE,range =c(0, 1) ) ),showlegend =TRUE,title ="Team Pitching Stats Radar Chart (2019 vs 2023)" )# Print the plotp1
Figure 7: The radar chart provides a comparative analysis of the Washington Nationals’ pitching performance across the 2019 and 2023 seasons, with the 2019 data represented by a pink area and the 2023 data by a purple area. This visualization indicates that the team’s pitching was more effective in 2019 across all six key metrics. After normalization, the Fielding Independent Pitching metric, which estimates a pitcher’s effectiveness at preventing home runs, walks, and other negative outcomes, shows a significant gap favoring 2019. Additionally, the WHIP metric suggests better control and efficiency for 2019, making it harder for opposing teams to score. A wider spread in the pink area along the SO9 axis also indicates a higher strikeout rate per nine innings, marking another aspect of pitching dominance in 2019. Higher values on the radar chart represent superior statistical performance. Data sources: https://www.baseball-reference.com/teams/WSN/2023.shtml#all_team_pitching and https://www.baseball-reference.com/teams/WSN/2019.shtml#all_team_pitching.
SO9: Strikeouts Per 9 Innings
WHIP: Walks and Hits Per Inning Pitched
ERA: Earned Run Average
R: Runs Allowed
H: Hits Allowed
FIP: Fielding Independent Pitching
After analyzing the six key pitching metrics for the Washington Nationals, a downward trend in performance is evident. Other than the “Hits Allowed” metric, which has not changed significantly, the other metrics clearly decreased in 2019. Notably, the data suggests a decline in individual pitchers’ skills, which warrants a detailed examination in the individual players’ section to identify specific areas of concern and potential improvements. This in-depth analysis is crucial for understanding the underlying issues contributing to the overall dip in pitching effectiveness.
Figure 8: Radar chart for top 5 appearence pitchers from the 2019 Nats and 2023 Nats. The radar plot contains 5 major individual aspect of pitchers: ‘Earned Run Average’, ‘Walk+Hit per inning’,‘Hit per 9 inning’,‘StrikeOut per 9 inning’,‘Walk per 9 inning’. The pitcher statistics data is obtained from https://www.baseball-reference.com/leagues/majors/2023-standard-pitching.shtml
This comparison highlights an intriguing contrast in pitcher performance between 2023 and 2019.3 out of top 5 pitchers from 2023 has significant weaker performance comparing to pitchers in 2019. Despite improvements in certain metrics for the rest of 2 2023 pitchers, numbers of hits remain relatively consistent. This suggests that the overall pitchers lineup in 2023 are weaker than lineup in 2019.
Batting Analysis: Team and Individual Levels
Code
# Washington Nationals 2019 vs 2023 Analysis# Data prep library(openxlsx)library(plotly)library(tidyverse)library(fmsb)library(ggplot2)library(readxl)# Read the datasetspitching2019 <-read_excel("../data/2019Washington_Pitching_Individual.xlsx")batting2019 <-read_excel("../data/2019Washington_Batting_Individual.xlsx")pitching2023 <-read_excel("../data/2023Washington_Pitching_Individual.xlsx")batting2023 <-read_excel("../data/2023Washington_Batting_Individual.xlsx")# Print the first few rows of the data to confirm it's read correctly# head(pitching2019)# head(batting2019)# head(pitching2023)# head(batting2023)# Convert columns to numeric, ignoring non-numeric values (coerce problems to NA)pitching2019$ERA <-as.numeric(pitching2019$ERA)pitching2019$WHIP <-as.numeric(pitching2019$WHIP)pitching2019$SO9 <-as.numeric(pitching2019$SO9)pitching2023$ERA <-as.numeric(pitching2023$ERA)pitching2023$WHIP <-as.numeric(pitching2023$WHIP)pitching2023$SO9 <-as.numeric(pitching2023$SO9)# Handle NA values by using mean with na.rm = TRUEpitching_data <-data.frame(metric =c("ERA", "WHIP", "SO9"),`2019`=c(mean(pitching2019$ERA, na.rm =TRUE), mean(pitching2019$WHIP, na.rm =TRUE), mean(pitching2019$SO9, na.rm =TRUE)),`2023`=c(mean(pitching2023$ERA, na.rm =TRUE), mean(pitching2023$WHIP, na.rm =TRUE), mean(pitching2023$SO9, na.rm =TRUE)))# Combined '2019' and '2023' for pitching & batting# Add a 'Year' column to each dataset# Filter on only SP and RPpitching2019 <- pitching2019 %>%mutate(Year =2019) %>%filter(Pos=="SP"|Pos=="RP")# Filter on top 15 rankbatting2019 <- batting2019 %>%mutate(Year =2019) %>%filter(Rk <=15)# Filter on only SP and RPpitching2023 <- pitching2023 %>%mutate(Year =2023) %>%filter(Pos=="SP"|Pos=="RP")# Filter on top 15 rankbatting2023 <- batting2023 %>%mutate(Year =2023) %>%filter(Rk <=15)# Combine the pitching datasetscombined_pitching <-rbind(pitching2019, pitching2023)# Combine the batting datasetscombined_batting <-rbind(batting2019, batting2023)# Drop Inf for ERA col in 'combined_pitching'combined_pitching <- combined_pitching[is.finite(combined_pitching$ERA), ]# Print the first few rows of the combined data to confirm it's combined correctly# head(combined_pitching)# head(combined_batting)# Calculate the mean of the provided pitching stats for the team# We first calculate the means of the statistics for the teamteam_stats_2019 <- pitching2019 %>%summarise(ERA =mean(ERA, na.rm =TRUE),WHIP =mean(WHIP, na.rm =TRUE),SO9 =mean(SO9, na.rm =TRUE),FIP =mean(FIP, na.rm =TRUE),H =mean(H, na.rm =TRUE),R =mean(R, na.rm =TRUE))categories <-c("ERA", "WHIP", "SO9", "FIP", "H", "R","ERA")max_values <-c(ERA =max(combined_pitching$ERA), WHIP =max(combined_pitching$WHIP), SO9 =max(combined_pitching$SO9), FIP =max(combined_pitching$FIP), H =max(combined_pitching$H), R =max(combined_pitching$R)) # Upper limit of poor performancemin_values <-c(ERA =min(combined_pitching$ERA), WHIP =min(combined_pitching$WHIP), SO9 =min(combined_pitching$SO9), FIP =min(combined_pitching$FIP), H =min(combined_pitching$H), R =min(combined_pitching$R)) # Lower limit of good performance# Normalize the stats (Lower is better for ERA, WHIP, FIP, H, and R; higher is better for SO9)# For ERA, WHIP, FIP, H, R we subtract from the max value to invert the scaleteam_stats_normalized_2019 <-mutate(team_stats_2019,ERA = (max_values['ERA'] - ERA) / (max_values['ERA'] - min_values['ERA']),WHIP = (max_values['WHIP'] - WHIP) / (max_values['WHIP'] - min_values['WHIP']),SO9 = (SO9-min_values['SO9']) / (max_values['SO9'] - min_values['SO9']),FIP = (max_values['FIP'] - FIP) / (max_values['FIP'] - min_values['FIP']),H = (max_values['H'] - H) / (max_values['H'] - min_values['H']),R = (max_values['R'] - R) / (max_values['R'] - min_values['R']))values_2019 <-c(team_stats_normalized_2019$ERA, team_stats_normalized_2019$WHIP, team_stats_normalized_2019$SO9, team_stats_normalized_2019$FIP, team_stats_normalized_2019$H, team_stats_normalized_2019$R)values_2019 <-c(values_2019, values_2019[1]) # Add the first value to the end to close the radar chartteam_stats_2023 <- pitching2023 %>%summarise(ERA =mean(ERA, na.rm =TRUE),WHIP =mean(WHIP, na.rm =TRUE),SO9 =mean(SO9, na.rm =TRUE),FIP =mean(FIP, na.rm =TRUE),H =mean(H, na.rm =TRUE),R =mean(R, na.rm =TRUE))categories <-c("ERA", "WHIP", "SO9", "FIP", "H", "R","ERA")max_values <-c(ERA =max(combined_pitching$ERA), WHIP =max(combined_pitching$WHIP), SO9 =max(combined_pitching$SO9), FIP =max(combined_pitching$FIP), H =max(combined_pitching$H), R =max(combined_pitching$R)) # Upper limit of poor performancemin_values <-c(ERA =min(combined_pitching$ERA), WHIP =min(combined_pitching$WHIP), SO9 =min(combined_pitching$SO9), FIP =min(combined_pitching$FIP), H =min(combined_pitching$H), R =min(combined_pitching$R)) # Lower limit of good performance# Normalize the stats (Lower is better for ERA, WHIP, FIP, H, and R; higher is better for SO9)# For ERA, WHIP, FIP, H, R we subtract from the max value to invert the scaleteam_stats_normalized_2023 <-mutate(team_stats_2023,ERA = (max_values['ERA'] - ERA) / (max_values['ERA'] - min_values['ERA']),WHIP = (max_values['WHIP'] - WHIP) / (max_values['WHIP'] - min_values['WHIP']),SO9 = (SO9-min_values['SO9']) / (max_values['SO9'] - min_values['SO9']),FIP = (max_values['FIP'] - FIP) / (max_values['FIP'] - min_values['FIP']),H = (max_values['H'] - H) / (max_values['H'] - min_values['H']),R = (max_values['R'] - R) / (max_values['R'] - min_values['R']))values_2023 <-c(team_stats_normalized_2023$ERA, team_stats_normalized_2023$WHIP, team_stats_normalized_2023$SO9, team_stats_normalized_2023$FIP, team_stats_normalized_2023$H, team_stats_normalized_2023$R)values_2023 <-c(values_2023, values_2023[1]) # Add the first value to the end to close the radar chart# Define the batting statistics we are interested inbatting_stats <-c("BA", "OBP", "SLG", "OPS", "OPS+", "HR")# Calculate the mean for each batting statistic for 2019 and 2023 separatelybatting_stats_2019 <- combined_batting %>%filter(Year ==2019) %>%summarise(across(all_of(batting_stats), mean, na.rm =TRUE))batting_stats_2023 <- combined_batting %>%filter(Year ==2023) %>%summarise(across(all_of(batting_stats), mean, na.rm =TRUE))# Calculate max and min values for normalization from the combined datasetmax_values_batting <-sapply(combined_batting[, batting_stats], max, na.rm =TRUE)min_values_batting <-sapply(combined_batting[, batting_stats], min, na.rm =TRUE)# Normalize the stats for each yearnormalized_batting_2019 <-as.data.frame(mapply(function(x, min, max) (x - min) / (max - min), batting_stats_2019, min_values_batting, max_values_batting))normalized_batting_2023 <-as.data.frame(mapply(function(x, min, max) (x - min) / (max - min), batting_stats_2023, min_values_batting, max_values_batting))# Add the first statistic at the end to close the radar chart loopvalues_2019 <-unlist(normalized_batting_2019)values_2019 <-c(values_2019, values_2019[1]) # Add the first value to the end to close the radar chartvalues_2023 <-unlist(normalized_batting_2023)values_2023 <-c(values_2023, values_2023[1]) # Add the first value to the end to close the radar chartcategories <-c(batting_stats, batting_stats[1]) # Add the first category to the end to close the loop# Create the original valuesvalues_2019_org <-unlist(batting_stats_2019)values_2019_org <-c(values_2019_org, values_2019_org[1]) # Add the first value to the end to close the radar chartvalues_2023_org <-unlist(batting_stats_2023)values_2023_org <-c(values_2023_org, values_2023_org[1]) # Add the first value to the end to close the radar chart# Create a radar chart in plotly with both 2019 and 2023 datap2 <-plot_ly() %>%add_trace(type ='scatterpolar',mode ='markers+lines',fill ='toself',r = values_2019,theta = categories,fillcolor ='rgba(255, 0, 0, 0.5)',line =list(color ='rgba(255, 0, 0, 0.5)'),hoverinfo='text',hovertext=~paste("2019","<br>Stat:",categories,"<br>Original Value:", round(values_2019_org,3)),name ='2019' ) %>%add_trace(type ='scatterpolar',mode ='markers+lines',fill ='toself',r = values_2023,theta = categories,fillcolor ='rgba(0, 0, 255, 0.5)',line =list(color ='rgba(0, 0, 255, 0.5)'),hoverinfo='text',hovertext=~paste("2023","<br>Stat:",categories,"<br>Original Value:", round(values_2023_org,3)),name ='2023' ) %>%layout(polar =list(radialaxis =list(visible =TRUE,range =c(0, 1) ) ),showlegend =TRUE,title ="Team Batting Stats Radar Chart (2019 vs 2023)" )# Print the plotp2
Figure 9: The radar chart offers a side-by-side comparison of the Washington Nationals’ offensive performance using batting statistics from the 2019 and 2023 seasons. The 2019 batting achievements are represented by the pink region, while the purple region depicts the 2023 outcomes. After normalizing the data, it is evident that the team’s batting metrics in 2019 surpassed those in 2023 across all six key metrics. Specifically, the 2019 season shows higher values in SLG, OBP, BA, and OPS, suggesting a more potent offensive lineup capable of more effective hitting and run production. Additionally, the HR metric, which indicates power-hitting prowess, was also greater in 2019. The OPS+ values, which adjust for league and park variations, are relatively similar for both years, suggesting that when accounting for external factors, the team’s offensive efficiency has not significantly deviated from league norms. Data sources: https://www.baseball-reference.com/teams/WSN/2023.shtml#all_team_batting and https://www.baseball-reference.com/teams/WSN/2019.shtml#all_team_batting.
SLG: Slugging Percentage
OBP: On-Base Percentage
BA: Batting Average
HR: Home Runs
OPS: On-base Plus Slugging
OPS+: Adjusted On-base Plus Slugging (normalized across the league, with 100 being the average)
The Nat’s batting performance has shown a marked decline in 2023, as evidenced by the significant gaps in all six metrics compared between 2019 and 2023. This downturn suggests a need for a thorough examination of individual batters’ abilities to pinpoint the factors contributing to the stark decrease in batting effectiveness. Understanding the underlying issues, such as changes in player composition, injuries, or shifts in batting strategy, is crucial. This analysis will help identify specific areas for improvement and develop strategies to enhance overall performance, ensuring the team’s offensive capabilities are optimized in future seasons.
Figure 10: Radar chart for top 5 batting players from the 2019 Nats and 2023 Nats. The radar plot contains 5 major individual aspect of batters: ‘Batting Average’, ‘On-base percentage’,‘On-base Plus Slugging Plus’,‘Slugging’,‘Hits’. The Batter statistics data is obtained from https://www.baseball-reference.com/leagues/majors/2019-standard-batting.shtml.
This comparison highlights an intriguing contrast in batters performance between 2023 and 2019. 4 out of top 5 batter from 2023 has slightly weaker performance comparing to batters in 2019. However, when we compare the individual statistics with the team statistics, we can see the gap on performance are larger for team statistics, indicates more batting improvement needed for rotation players.
Figure 11: Injury Timeline of the 2023 Washington Nationals: This bar graph presents an overview of the injuries sustained by the team across different positions during the 2023 season. Each bar represents the time frame a player was sidelined due to injury, offering insights into how the absence of key players, especially starting pitchers like Stephen Strasburg and relief pitchers like Tanner Rainey, has correlated with the team’s struggles. The visual highlights the critical periods where multiple injuries overlapped, further exacerbating the team’s challenges. This 2023 injury data is obtained from the source https://www.fangraphs.com/roster-resource/injury-report/nationals?timeframe=all&season=2023.
Injury is also a critical aspect to consider when evaluating a team’s performance, as it significantly influences team dynamics and strategy. An analysis was conducted on the impact of injuries on team performance through the development of a bar graph that details the injury timelines of each player. This graph tracks the start and expected recovery periods of injuries, providing clear insights into the availability of key players.
To specifically understand the impact of injuries on the Washington Nationals, an interactive bar graph was developed. This graph allows for the tracking of the start and expected recovery periods of injuries, offering clear insights into the availability of key players. A notable example from the analysis is Stephen Strasburg, the 2019 MVP. The chart reveals that Strasburg sustained an injury in 2021 and was not expected to return until the end of May 2023. Furthermore, he did not participate in any games in 2023. His absence was likely a significant factor in the team’s downturn that year, particularly given his outstanding performance in 2019. Additionally, the analysis examined periods where multiple injuries overlapped, which further exacerbated the team’s challenges. This analysis underscores a broader trend: the health of the players directly correlates with the ability to replicate the success of a championship season.
Proposed Strategies
Operation & Management
Invest More in the Fans. Provide more souvenirs or free team merchandise during the games to boost attendance. Create a better soundtrack during the game to increase fan engagement. This could help create a more harsh environment for the away team, enhancing the Pitcher’s Park Factor and Batter’s Park Factor.
Sign Star Batters. A good pitcher guarantees a strong defensive team performance, but fans generally prefer better offense. They want to see home runs. It’s necessary to maintain a balance between pitchers and batters on the payroll to attract more audience.
Team & Players Improvement
Pitching Control. Choose pitchers with better pitching control to reduce the Walks + Hits per inning (WHIP) and select pitchers with higher Fielding Independent Pitching (FIP) statistics.
Improve Batting. All six aspects require improvements to return to championship levels. Investing more money in signing talented batters is essential. This strategy will enhance the team’s offensive capabilities and contribute significantly to overall success.
Others
Injuries. Adjust the game plan to help players stay healthy. Consistent game attendance is key to leading a successful season. By ensuring players are in peak physical condition, the team can maintain a high level of performance throughout the season, ultimately increasing their chances of success.